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ABSTRACT 

Geiietic algorithms (GA) are optimization 
techniques that are based on the mechanics of 
evolution and natural selection. They take 
advantage of the power of cumulative selection, 
in which successive incremental improvements 
in a solution structure become the basis for 
continued development. A GA is an iterative 
procedure that maintains a "population" of 
"organisms" (candidate solutions). Through 
successive "generations" (iterations) the 
population as a whole improves in a simulation 
of Darwinism's "survival of the fittest". GAs 
have been shown to be successful where noise 
significantly reduces the ability of other search 
techniques to work effectively. 

Satellite altimetry provides useful information 
about oceanographic phenomena. It provides 
rapid global coverage of the oceans and is not 
as severely hampered by cloud cover as 
infrared imagery. Despite these and other 
benefits, several factors lead to significant 
difficulty in interpretation. 

The GA approach to the improved interpreta- 
tion of satellite data involves the representation 
of the ocean surface model as a string of 
parameters or coefficients from the model. The 
GA searches in parallel a population of such 
representations (organisms) to obtain the 
individual that is best suited to "survive", that 
is, the fittest as measured with respect to some 
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"fitness" function. Tile fittest organism is the 
one that best represents the ocean surface 
model with respect to the altimeter data. 


1. INTRODUCTION 

Much useful information about oceanographic 
phenomena can be obtained from an altimeter 
borne on a satellite. In addition to providing 
rapid global coverage of the oceans, satellite 
altimetry bypasses other (in-situ) measurement 
problems. It is not as severely hampered by 
cloud cover as infrared imagery, and it also 
measures oceanographic phenomena that have 
no surface thermal expression. Despite the 
benefits of altimetry, several factors lead to 
significant difficulty in interpretation. Among 
these are atmospheric noise (from water vapor, 
ionospheric electrons, solar activity, and so 
forth), scale errors (the magnitudes of many of 
the errors are greater than the phenomena 
measured), some measurements are time 
dependent while other related ones are time 
independent (the presence of the mean dynamic 
topography in the reference surface or "geoid", 
for example) and in the calculations of the 
geoid itself. 

In this paper we first present some background 
on the use of satellite altimetry data to measure 
the sea surface. The interpretation of these 
measurements is complicated by the difficulties 
referred to above. In order to improve our 
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interpretation of the altimeter data, we turned to 
a technique based on an optimization procedure 
believed to operate effectively in nature. 
Known as "genetic algorithms" (GA), these 
techniques have been shown to be successful in 
many environments. Because they search in 
parallel a large portion of the solution space, 
they are able to distinguish local optima from 
global ones. GAs can successfully search 
where noise significantly reduces the ability of 
other search techniques to work effectively. 
We demonstrate the effectiveness of GAs to fit 
a model of the sea-surface height to data 
obtained from satellite altimetry. 


2. BACKGROUND 

A satellite -borne radar altimeter measures the 
distance from its antenna's electrical center to 
the instantaneous sea surface, averaged over 
the footprint. Sea level is the difference 
between the altimeter-measured distance and 
the satellite's height; the latter is determined 
independently by tracking and orbit deter- 
mination. Then, the difference between sea 
level and the geoid, the sea surface height 
(SSH) residual, provides information on ocean 
dynamics. 

The geoid is a graviational equipotential 
surface. The marine geoid is the shape that 
would be taken by a resting ocean. Since the 
geoid does not change, and since the oceano- 
graphic component of sea-surface variations is 
generally relatively small and does change with 
time, the long-term temporal mean of sea level 
is a good approximation to the marine geoid. 

The situation is different when one tries to use 
the altimeter to measure ocean circulation. 
Then the "signal" is the small SSH residual that 
remains after the geoid is subtracted from sea 
level, and the "noise" is any error in knowledge 
of the geoid. For practical reasons, a large part 
of the information that goes into "geoids" 
comes from altimeter measurements them- 
selves. When there are permanent oceano- 
graphic features such as the Gulf Stream, there 
is a significant time-independent "dynamic 
height" component. Being independent, this 
component cannot be distinguished from the 
true geoid without additional information. 
Subtraction of a geoid containing this term 


from the altimeter-determined sea level may 
introduce a serious error (Lybanon et al., 
1990). 

One model for the sea surface height is realized 
as the difference between the expected dynamic 
height component and the reference surface 
error. This model can be tested using 
GEOSAT altimeter measurements of the Gulf 
Stream region, which has a strong mesoscale 
signal (Caiman, 1987). The following model 
has been proposed by Lybanon et al. (1990): 

SSH = Atanh[B(X-D-E)] - Ftanh[C(X-E)] - G (1) 

where X is the along-track coordinate of the 
satellite. The first term represents the 
instantaneous Gulf Stream, the second 
represents the mean Gulf Stream, and the third 
term is an overall bias due to orbit error or 
possibly other errors. The coefficients in each 
hyperbolic tangent represent the amplitude, the 
steepness of the sloping part, and the position 
of the curve, respectively. By fitting this 
model to the altimeter data, one can add the 
modeled mean Gulf Stream profile to the 
instantaneous sea surface height to allow a 
better description of the Gulf Stream. The key 
to this proposed technique is finding the 
coefficients from Equation (1) that best fit the 
altimeter data. Lybanon et al. (1988) have 
attempted to use standard mathematical curve- 
fitting schemes, but have achieved only mixed 
success. We propose using GAs to aid this 
process. 


3. GENETIC ALGORITHMS 

Genetic algorithms are optimization techniques 
that are based on the mechanics of evolution 
and natural selection. In contrast to other 
methods that rely on a point-to-point search of 
the domain space, GAs use a large sample of 
points from the domain. Each point, called an 
"organism", is a candidate solution of the 
problem in question. The large sample of 
candidate solutions (called a "population") is 
modified through successive iterations. Each 
modification is based on ideas taken from 
Darwinian natural selection. Although random- 
ness is a part of the process, each modification 
is guided by the candidate solutions that are 
most successful. These "fittest" organisms 
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contribute the most to succeeding iterations in a 
simulation of "survival of the fittest". Each 
successive population is called a "generation". 
Thus we have an intitial generation, G(0), and 
for each generation G(t), the GA forms a new 
one, G(t+1). An algorithm to implement GAs 
is given by: 


generate initial population, G(0); 

evaluate G(0); 

t:=0; 

repeat 

generate G(t+1) using G(t); 
evaluate G(t+1); 
t := t+1; 

until solution is found. 


Like all generate and test methods, the GA 
requires the two main steps of generation and 
evaluation. In order to evaluate a generation, a 
fitness function is needed. In nature, a species 
responds in some way to environmental 
pressure. The GA analog to this pressure is the 
fitness function. It is built from domain 
specific information and returns the relative 
merit or fitness of the organism (Goldberg, 
1989). 


3.1 Representation 

Our problem entails finding coefficients A, B, 
..., G which yield the best fit of the altimeter 
data when used in Equation (1). The 
measurement of the goodness of fit with 

respect to the data D is the "fitness function". 
Since we are searching for real number values 
for the coefficients A, B, .... G, the organisms 
for the GA used here are vectors r = <r a, rB, 
.... ro> of real numbers. The fitness of such 
an organism is the degree to which the model 
equation SSH(r) successfully fits the data. 

This view of the representation is useful at the 
higher level of the curve-fitting problem. 
However, the genetic algorithm works at a 
lower level — the level of bits. In order to 
successfully use the GA, we need to consider a 
representation of the real numbers rj at the bit 
level. Given upper and lower bounds for each 
r;, Ui and 1* respectively, we can look at r; as an 


unsigned binary integer with m bits and 
calculate its value with respect to li and ui. 

Given a binary integer b where b is in 
[0, 2^ -1], we can derive its corresponding real 
value using the formula 

r = b/2"» * (u - 1) + 1 (2) 

where u and 1 are the upper and lower bounds 
respectively. Combining these two levels, we 

construct an organism as 0 = <bAi bA2 ••• 
bA m> bjjl bg2 ••• bjjm, ... , boi bG2 ••• bcm ^ 
where each binary integer bn bj 2 ... bj m 
corresponds to a real number r; which lies in 
the interval [lj, uj for i = A, B, ... G. The 
correspondence is given in (2). 

Computing the value of the fitness function of 
an organism 0 requires two steps: first, 
converting each binary integer bii bj 2 ... bj m 
into its corresponding real value rj; then, 
second, evaluating the curve SSH(rA, re, .... 

ro) at the data points of D. 

3.2 Evaluation 

Since the fitness function is a measurement of 
how well the organism fits the data, it focuses 
the GA toward the solution. The fitness 
function used here is modeled after least 

squares/regression. An organism 0 is 
converted into a vector of real numbers, r = 
< ta , tb , tg> using equation (2). The 
fitness is then computed as the sum of the 
squares of the differences between the 
SSH(r;xj) and yj (that is, the residuals). Thus, 

fitness(0) = £ (SSH(r; Xi ) - yi )2 (3) 

where the summation is a taken over all data 
points (xj, yj) of D. With this fitness function, 
a value of 0 is considered a perfect fit and an 
organism is highly fit if its fitness value is low. 
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3.3 Convergence 

The GA is designed to improve the relative 
merit of the population over time. While the 
average fitness of one generation may be lower 
than the preceeding one, or while the best 
solution from one generation may not be as 
good as the best from a previous one, in 
general, fitnesses improve as generations 
unfold. Figure 1 in the appendix is an example 
of this point. 

In earlier generations, there is a great deal of 
variability among the organisms in a single 
generation. There is a wide range of fitness 
values in these earlier generations. As happens 
occasionally, a few organisms are generated 
whose fitnesses are exceedingly poor. This 
reduces the average fitness of the overall 
generation. The number of poor solutions 
generated is in proportion to the fitness of the 
generation as a whole. Thus, in the early 
generations, a larger number of poor solutions 
are formed. However, as the generations 
improve and larger numbers of the organisms 
have good fitnesses, the occasional poor 
performer does not affect the population as 
much. The effect of these less fit organisms in 
later generations is minimal. 

As the generations improve, the average fitness 
stabilizes. As a result, most of the organisms 
are nearly identical. This stabilization is called 
"convergence", and the GA is said to converge 
to the organism that appears most often. Of 
course, at convergence, nearly all organisms 
are identical. This commonality is the solution 
to the problem. 

3.4 Operators 

GAs are based on many of the same principles 
as those found in natural selection. They 
employ several operators and principles that are 
generally derived form those that occur in 
nature. There are three principal operators at 
work in GAs: selection, crossover and 
mutation. The first of these, selection, is the 
analog of the principle in natural selection that 
organisms that are most fit are most favorably 
disposed for participation in mating, thereby 
passing their genetic information to their 
offspring. The selection operator chooses 
individuals from the population so that those 


with high fitnesses have greater probability of 
being selected. This focus toward the highly fit 
individuals is what drives genetic algorithms. 

The method of selection that was used here is 
stochastic sampling without replacement, called 
"expected value" by Goldberg (1989). In 
addition, we have used de Jong's (de Jong 
1975) "elitist" strategy, whereby the single best 
organism form one generation is placed 
unchanged into the next generation. This 
strategy gives a little more weight to the best 
organism than might be achieved from selection 
alone and prevents the possibility that the best 
organism might be lost early through crossover 
or mutation. 

The GA analogy to mating is called crossover. 
The crossover operator provides a mixing of 
the genetic material from the parents, and 
globally, it mixes the genetic information of the 
whole population. It is the mixing of the 
"genes", the stirring of the pot of genetic 
material, that gives the GA robustness. The 
two organisms chosen by selection are 
combined to form a new individual with 
similarities to both parents. If the mixing is 
done carefully, a large amount of genetic 
material can be tested. Although selection 
focuses the genetic algorithm, it is crossover 
that adds variety. 

We employed a "two-point" crossover. Two- 
point crossover proceeds as follows: Once the 
organisms (the "parents") have been selected 
for mating, two bit positions are chosen at 
random. The middle segments between these 
bit positions of the two organisms are 
interchanged to form two new organisms. 
These new organisms (the "offspring") are 
added to the next generation. The process of 
selection and crossover is repeated until the 
new generation has the same number of 
members as the previous generation. 

While selection and crossover are the chief 
operators used in GAs, there are numerous 
other minor operators proposed to strengthen 
GAs under certain circumstances. It has been 
shown that for certain applications, these minor 
operators can add to the GA's efficiency or 
prevent it from converging to a local optimum 
rather than global one. For example, it 
sometimes happens that the GA converges to a 
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solution prematurely. This is due to the fact 
that crossover only mixes the genetic material 
that is present in the initial population; it 
doesn't introduce any new material. In nature, 
new genes are introduced into a species 
through mutation. Analogously in GAs, a 
mutation operator is used to modify an 
organism occasionally in order to add new 
genetic material into the population and to 
prevent premature convergence. 

We used a mutation method that adds a real 

value e to (or subtracts it from) the organism's 
value at one of the coefficents. We kept the 
probability of a mutation low. Thus, if by 
chance a particuplar organism was to be 
mutated at one of its coefficients, r, then a 

small 8j was added to (or subtracted from) r. 
The value of e is a power of 2 ranging from 1 

to 2 m . Thus, if we let £i = < 0 ... 0 1 0 ... 0> 
with 1 in the i-th position and 0's elsewhere, 

then this mutation method effectively adds £; to 

(or subtracts £; from) the coefficient of the 
organism to be mutated. 

4. RESULTS 

The GA technique outlined above is dependent 
on the choice of boundary points li and r; for 
each i = A, B, .... G. Knowledge of the 
problem domain may by useful to ascertain 
these boundaries. If the knowledge is inade- 
quate, a degree of experimentation may be 
required. Our knowledge of the problem, 
gained in part by previous work (Lybanon et 
al., 1988), gave us some information about the 
coefficients. We knew that the amplitudes of 
the curves (coefficients A and F) were positive 
and less than 1 . Likewise we knew the slope 
coefficients (B and C) were also positive and 
less than 1. Hence, for all these values, we 
used domain intervals [0, 1]. The error of G 
was small, but its sign was unknown to us. 
We used [-1, 1] as its domain interval. The 
difficult values to determine were the position 
coefficients D and E. 

We began with intervals of length 100 for each 
of D and E. We were prepared to test several 
intervals of this length, [-50, 50], [-75, 25], 


[-25, 75], and so forth. If necessary, we might 
have had to increase the length of the intervals 
to 200 or more. In any case, we were prepared 
to do the experimentation needed for the GA. 
The results given below show that we needed 
only the second interval mentioned. 

After several runs using the domain interval 
[-50, 50] for coefficents D and E, it became 
clear that this domain did not include the value 
for E, and perhaps not D either. One run 
placed the optimal E exactly at -50, indicating 
either a coincidence that we chose an endpoint 
of the domain interval very near the optimal 
value or the possibility that E lay below -50. 
The other runs had somewhat low errors, yet 
the values of the coefficients were not near each 
other. This could also mean that we stopped 
the GA too soon, before it had a chance to 
converge. See Table 1. On the strength of the 
former observation, we abandoned the [-50, 
50] interval for E in favor of [-75, 25]. If 
indeed -50 was the optimal value for E, this 
new interval would bear this out. If the error 
was due to the optimal E being smaller than 
-50, then this or another interval would be 
better. If the results for the new interval were 
likewise inconsistent, we would allow the GA 
to run for several generations longer and 
compare results. Since D did not seem to 
suffer the same error as E, we were not quick 
to adjust D's domain interval. Table 2 exhibits 
the results obtained with these new intervals. 
The value of E derived here supports our earlier 
decision to reduce the lower bound for E's 
interval. 


Table 1: Coefficients obtained with 

intervals [-50, 50] for D and E. 


coefficient 

run # 1 

run # 2 

run # 3 

A 

.273373 

.191273 

.240241 

B 

.435305 

.499703 

.504699 

C 

.058454 

.008573 

.008887 

D 

15.7059 

-46.5097 

-40.3814 

E 

-50.0000 

13.6979 

7.37317 

F 

.260792 

.640602 

.851496 

G 

-.028586 

-.352537 

-.414537 

error 

.612744 

.903165 

.825254 
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Table 2 . Coefficients obtained with intervals 
[-75, 25] for E and [-50, 50] for D. 


coefficient 

run# 1 

run # 2 

run #3 

A 

.186314 

.186297 

.186302 

B 

.757704 

.757711 

.757642 

C 

.174833 

.174889 

.174872 

D 

24.7986 

24.8003 

24.7997 

E 

-58.7977 

-58.7995 

-58.7989 

F 

.162858 

.162844 

.162848 

G 

-.025868 

-.025869 

-.025868 

error 

.558747 

.558747 

.558747 


To contrast these results, we tried the interval 
[-25, 75]. We obtained inconsistencies in all 
runs, as might be expected, as the correct value 
of E was far removed from this interval. 

In this problem, we had some knowledge of 
the coefficients. However, one can obtain very 
accurate results with very little knowledge, if 
one is allowed to experiment. Beginning with 
a domain interval of [-100, 100] for each 
coefficent, after several runs we were able to 
revise our results for A, F, and G as lying in 
the interval [-5, 5]. The experiments gave us 
no information about B, C, D, or E, however. 
We ran the GA several more times on the 
revised intervals. At this point we were able to 
further narrow the intervals for some of the 
coefficients. By repeating this process through 
just four stages of interval reduction, we were 
able to obtain the results in Table 3. 

When small enough intervals are used (gained 
through either experimentation or knowledge of 
the model), one can get a very accurate fit of 
the data. With an accurate fit, the dynamic 
height component can be removed yielding a 
more accurate interpretation of the data. See 
Figure 2 in the appendix. It shows the original 
altimeter data of the Gulf Stream and the 
adjusted values after the dynamic component 
has been removed. 


5. SUMMARY 

We have demonstrated that genetic algorithms 
can be used successfully to improve the 
interpretation of altimeter data in a model for 
the sea surface height. There are several 
strengths to this approach. First, it does not 


Table 3. Coefficients obtained after several 
stages of reducing the intervals 

coefficient 

run # 1 

run # 2 

run # 3 

A 

.186290 

.186290 

.186290 

B 

.757685 

.757680 

.757682 

C 

.174907 

.174907 

.174907 

D 

24.8009 

24.8009 

24.8009 

E 

-58.8000 

-58.8000 

-58.8000 

F 

.162838 

.162838 

.162838 

G 

-.025870 

-.025870 

-.025870 

error 

.558747 

.558747 

.558747 


require complex calculation nor is it difficult to 
set up. Second, it is accurate in its present 
form. With 32 bit representation of integers, 
we easily obtained 4, 5, or 6 significant digits. 
More accuracy can be achieved with minor 
revisions. Finally, the results were consistent, 
although the initial genetic algorithm parameters 
needed to be established at the beginning. 

The method suffers from some weaknesses, 
however. First, the initial set of parameters is 
not univerally known. There must be some 
experimentation on these parameters initially 
and more experimentation on these if there is a 
significant change in the structure of the model. 
Second, the method requires some knowledge 
of the model coefficients. This knowledge can 
be gained through inspection of the data or of 
the function itself, or it can be gained through 
experimentation. In either case, the genetic 
algorithm method is a viable technique for 
improving the interpretation of the altimetry 
data used in this model. 
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Figure 1 Average fitness of a population demonstrating convergence 
in a Genetic Algorithm 
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Figure 2 Original altimeter data and Correction. Correction term has been 
displaced for easier viewing. 
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